Microcomputed tomography versus plethysmometer and electronic caliper in the measurements of lymphedema in the hindlimb of mice

Lymphedema affects 20% of women diagnosed with breast cancer. It is a pathology with no known cure. Animal models are essential to explore possible treatments to understand and potentially cure lymphedema. The rodent hindlimb lymphedema model is one of the most widely used. Different modalities have been used to measure lymphedema in the hindlimb of mice, and these are generally poorly assessed in terms of the interrater agreement; thus, there could be a risk of measuring bias and poor reproducibility. We examined the interrater agreement of µCT-scans, electronic caliper thickness of the paw and plethysmometer in the measurement of lymphedema in the hindlimb of mice. Three independent raters assessed 24 C57BL6 mice using these three modalities four times (week 1, 2, 4 and 8) with a total of 96 samples. The mean interrater differences were then calculated. The interrater agreement was highest in the µCT-scans, with an extremely low risk of measurement bias. The interrater agreement in the plethysmometer and electronic caliper was comparable with a low to moderate risk of measurement bias. The µCT-scanner should be used whenever possible. The electronic caliper should only be used if there is no µCT-scanner available. The plethysmometer should not be used in rodents of this size.

www.nature.com/scientificreports/ photograph 5 . A recent study examined these conventional techniques and found the electronic caliper to have a high interrater agreement and the fewest outliers compared to the other two techniques 5 . Water displacement technique (plethysmometer) is often used in rodent hindlimb lymphedema research [10][11][12] . However, the interrater agreement has never been examined. In recent years, 3D hindlimb volumetry such as micro-computed tomography (µCT), magnetic resonance imaging (MRI) and high-resolution ultrasound (hrUS) have been introduced for rodent hindlimb volumetry 5,[13][14][15] . In 2016, Frueh et al. 5 examined these three modalities finding high interrater agreement among all three but with µCT as the modality with the lowest risk of measurement bias.
Two studies have investigated interrater agreement of µCT-scans and shown an extremely low risk of measuring bias 5,16 . To our knowledge, electronic caliper measurements have only been examined in a single study in terms of interrater agreement 5 , and the plethysmometer has never been examined. Thus, lymphedema studies on the hindlimbs of mice are being conducted without proper knowledge of possible measuring bias and reproducibility, and further research is needed to standardize parameters for measuring lymphedema in the hindlimb of mice.
The primary aim of this study was to examine the interrater agreement of µCT-scans, electronic caliper thickness of the paw and plethysmometer in the measurement of lymphedema in the hindlimb of mice. The secondary aim was to conduct a correlation analysis of the µCT-scans with the electronic caliper and plethysmometer. The population of interest were C57BL/6 mice, and the rater population of interest consisted of three medical doctors. The Guidelines for Reporting Reliability and Agreement Studies (GRRAS) 17 were applied.

Results
The results are presented in separate paragraphs for measurements, interrater agreement and correlation analysis.
Measurements. The µCT-scans were measured in cubic millimeters (mm 3 ), the plethysmometer in milliliters (ml) and the electronic caliper in millimeters (mm).
µCT-scans. The mean volume across all mice for the lymphedema hindlimb was 220.1 mm 3 and 153.7 mm 3 for the control hindlimb.
Plethysmometer. The mean volume across all mice for the lymphedema hindlimb was 0.09 ml, and 0.06 ml for the control hindlimb.

Discussion
In this study, we examined the interrater agreement of plethysmometer, electronic caliper and µCT-scans in the measurement of lymphedema in mice. Subsequently, we did a correlation analysis between µCT-scans and the two conventional modalities (plethysmometer and electronic caliper). Twenty-four mice were included in this study. Lymphedema was induced by irradiation and surgery, and the mice were measured with µCT-scans, electronic caliper and plethysmometer in weeks 1, 2, 4 and 8 by three raters. The estimated mean difference for the hindlimbs between the three raters was then calculated. The three different measurement modalities are discussed in three different paragraphs. The correlation analysis is likewise discussed in a separate paragraph.
Plethysmometer. The mean interrater differences for the lymphedema hindlimb for the plethysmometer were 0 ml, − 0.007 ml and 0.004 ml between the three raters. The volumes of the hindlimbs ranged from 0.02 to 0.25 ml with the mean volume being 0.09 ml for the lymphedema hindlimb. Therefore, the highest mean interrater difference equals 7.78% of the mean hindlimb volume, and the lowest difference equals 0% of the mean volume.
The mean differences of 0 ml and 0.004 ml indicate that the plethysmometer has a low risk of measurement bias, while − 0.007 ml indicates a moderate risk of measurement bias.
Overall, the plethysmometer has a low to moderate risk of measurement bias. The low range of volumes (75% of the lymphedema hindlimbs being 0.11 ml or less) should theoretically lead to low mean differences, which is the case in R1 vs R2 (0 ml) and R2 vs R3 (0.004 ml) but not the case in R1 vs R3 (− 0.007 ml). The low range of numbers was due to the small size of the mice and the plethysmometer's lowest detectable difference (0.01 ml). The low range of numbers increases the risk of a biased low difference between the mean interrater differences, thus a biased high interrater agreement.
The lowest detectable difference was 0.01 ml in the plethysmometer that we used. 0.01 ml equals 10 mm 3 and is a considerable amount in mice of this size, where the mean control hindlimb is 154 mm 3 over 8 weeks. In contrast, the mean difference between rater R2 and R3 measured by the µCT was 0.07 mm 3 equaling 0.00007 ml.
The size of the mice, and the water principle itself, made it difficult to standardize the measurements when inserting the hindlimb of the mice into the water. A few millimeters of deeper or shallower insertion into the water yielded 0.01 ml of difference. It is also important to note that every time a hindlimb is inserted and removed from the water, a small amount of water will adhere to the hindlimb of the mice, and therefore be removed from the plethysmometer. When water is removed the subsequent measurement will not take the new water level into account unless the plethysmometer is recalibrated. Ideally, the plethysmometer should be recalibrated after each measurement, a step the manufacturer only recommends at the beginning of the measurements. The full calibration took 20 min, which made calibration practically impossible after each measurement. Calibration was done every fifth mouse approximately, making measuring with the plethysmometer a lot more complicated than anticipated. www.nature.com/scientificreports/ The plethysmometer is being used in lymphedema research in mice and was lastly used by Hayashida et al. 10 . They used a plethysmometer from Muromachi (MK-101 CMP; Muromachi Kikai Co., Ltd., Tokyo, Japan). According to the manufacturer, it can detect changes as low as 0.001 ml for mice and 0.01 ml for rats 18 . This is perhaps an overestimation due to the water principle. This assertion is backed by Shioiya et al. 11 . They studied lymphedema in the hindlimb of mice with a plethysmometer from Ugo Basile (Gemonio, Italy). This plethysmometer has, in agreement with the plethysmometer used in this study, a lowest detectable difference of 0.01 ml 19 . We found two other plethysmometers with 0.01 ml as the lowest detectable difference 20,21 , and one that did not classify 22 . We could not find a plethysmometer with the same claim regarding the 0.001 ml of accuracy. However, the plethysmometer from Muromachi should be examined as it can potentially be more sensitive than other plethysmometers from other manufacturers.
The plethysmometer is probably better suited for animals of a bigger size. Shejawal et al. 12 assessed the plethysmometer in the hindlimb of 18 rats weighing 180-220 g. In comparison, our mice weighed approximately 20 g. They found a high correlation between the rat's hindlimbs and different known volumes inserted in the water, but they did not examine the interrater agreement 12 . Further studies are needed to examine the interrater agreement of the plethysmometer in rats. The highest mean interrater differences for our control hindlimb, which did not have lymphedema and thus a lot smaller than the lymphedema hindlimb, was 17.2% compared to the lymphedema hindlimb of 7.8%. This indicates that the plethysmometer might have a smaller risk of measurement bias for bigger animals.
The water principle, which is used in the plethysmometer, was examined by Pan et al. 23 on a mouse tail model. Pan et al. found highly variable measurements within the same tail resulting in a high risk of measurement bias 23 . The tail of mice should theoretically be better suited for a water displacement technique, as it is easily insertable in the small tube, can be easily standardized and has a lower risk of bias in terms of shallower/deeper insertion of the tail as the diameter is smaller than a hindlimb. Still, Pan et al. found highly variable measurements 23 . A plethysmometer is relatively expensive (less than a µCT-scanner), but once the plethysmometer is bought, there are no ongoing expenses.
µCT-scans. The mean interrater differences for the µCT-scans for the lymphedema hindlimbs were − 0.73 mm 3 , − 0.81 mm 3 and − 0.07 mm 3 between the raters. The mean volume of the lymphedema hindlimb was 220.1 mm 3 . Therefore, the highest mean interrater difference equals 0.37% of the mean hindlimb volume, and the lowest difference equals 0.03% of the mean volume. This indicates an extremely low risk of measuring bias in agreement with our previous study 16 and Frueh et al. 5 .
The mean differences for the control hindlimb were similar with the lowest being − 0.13 mm 3 95% CI [− 0.85, 0.59] between R1 and R3 and 0.69 mm 3 95% CI [− 0.16, 1.53] between R2 and R3 with 153.7 mm 3 as the mean volume equaling 0.08% and 0.45%. The results of the control hindlimbs are in agreement with the lymphedema hindlimbs, and underlines the extremely low risk of measurement bias of the µCT-scans.
It should be noted that this study used Inveon Research workplace (version 4.2, IRW; Siemens Healthcare, Ballerup, Denmark) as the software for measuring the volume through the µCT-scans. It is unclear whether the same results can be obtained by different software. This should be investigated in future studies.
A µCT-scanner is the most expensive of the modalities, and there are ongoing expenses as each µCT-scan carry a cost.
Electronic caliper. The mean interrater differences for the electronic caliper for the lymphedema hindlimbs were 0.39 mm, 0.21 mm and − 0.18 mm between the three raters. The mean thickness of the paw was 3.27 mm. Therefore, the highest mean interrater difference equals 11.92% of the mean paw thickness, and the lowest difference equals 5.50% of the mean paw thickness.
The results of the control hindlimbs are comparable with the lymphedema hindlimbs (2.27% lowest and 17.6% as the highest).
The mean differences between the raters are relatively low and indicate a low to moderate risk of measurement bias.
The use of an electronic caliper to measure the thickness of the paw is used as a surrogate parameter for hindlimb volume in lymphedema research on mice and was lastly used by Daneshgaran et al. 24 .
The electronic caliper is theoretically a good instrument in measuring the paw size in mice, as it can detect minimal changes in paw thickness of 0.01 mm.
It is the cheapest option of our three modalities as it costs less than the plethysmometer and a µCT-scanner, and there are no ongoing expenses once the caliper is bought. The caliper is easy to use.
A limitation is that the paw of the mice is soft and can be depressed by pressure of the caliper. Thus, it can be speculated that just a tiny amount of pressure can result in significant variations of results.
Frueh et al. 5 assessed the electronic caliper in the mouse hindlimb and found a high interrater agreement, although not as high as µCT. Sharma et al. 25 compared the electronic caliper vs a plethysmometer from Ugo Basile (Gemonio, Italy) on the hindlimb of mice and found the caliper to be more sensitive. No interrater agreement was examined. The electronic caliper is also being used in the mice lymphedema tail models, as seen in Ghanta et al. 26 , but the interrater agreement is yet to be examined. Correlation analysis. µCT had previously shown an extremely low risk of measurements bias 5,16 and was for that reason, chosen as the reference standard in the correlation analysis. The overall correlation coefficient between the plethysmometer and the µCT-scans was 0.56, 95% CI [0.42-0.70] (p < 0.0001), indicating a moderate correlation. The control hindlimb showed no correlation for raters combined across both modalities. The control hindlimb is significantly smaller than the lymphedema hindlimb. This highlights the issue of measuring small volumes with the plethysmometer and electronic caliper.
It is important to note that in order to do the correlation analysis, we had to standardize the measurements of the three different modalities, and therefore there is a risk of bias. Because of this risk, our conclusion is solely based upon the interrater agreement.

Conclusion
µCT-scans were superior to the electronic caliper and the plethysmometer when assessing lymphedema in the hindlimb of mice. Electronic caliper was comparable to the plethysmometer, both having a low to moderate risk of measurement bias. See Table 5 and Fig. 7.
The plethysmometer had a low range of values due to the lowest detectable difference of 0.01 ml and the small size of the hindlimb of the mice. This increased the risk of a biased high interrater agreement. Therefore, the plethysmometer should not be used in rodents of this size to assess the lymphedema of the hindlimbs. The plethysmometer might be a better instrument in rodents of a bigger size. This should be investigated in future studies.
The electronic caliper should only be used if there is no µCT-scanner available. The µCT-scans have an extremely low risk of measurement bias in the assessment of hindlimb lymphedema in mice and should be used whenever possible.

Materials and methods
The mice were measured with µCT-scans, electronic caliper and plethysmometer in weeks 1, 2, 4 and 8 by three raters R1 (AB), R2 (AW) and R3 (FD). All three raters measured all mice at all weeks. The absolute number of samples was, therefore, 24 mice * 4 weeks = 96 samples. The CLSI recommendation is at least 40 samples 27 . Week numbers 1 and 8 were chosen to get the broadest range of values. The absolute number of samples and the broad range of values are in accordance with the guidelines from CLSI 27 . Measurements were conducted independently and were blinded between raters. The ARRIVE guidelines 28 were followed to the extent of possibility, as there are natural limitations to the guidelines as this is not an experimental study.
Plethysmometer and electronic caliper measurements in week 1 are unavailable as the machine administering the anesthetic gases did not work properly, and measurements while the mice were awake, were assessed not sufficient. Measurements in weeks 2, 4 and 8 were conducted properly. We had prior experience with µCT and these were done with anesthesia injections all weeks. Although the results from week 1 were not available we still had 57 samples in week 2, 4 and 8 which is in accordance with the The Clinical and Laboratory Standard Institute's (CLSI) recommendations, recommending at least 40 samples 27 . Animals. The National Animal Inspectorate in Denmark approved this study (2018-15-0201-01445), which includes an ethical approval. All experiments were conducted according to the national laws of animal research.
Twenty-four 9-week old female C57BL6 mice from Janvier (Janvier Labs, Le Genest-Saint-Isle, Saint-Berthevin Cedex, France) were used in this study. The mice were acclimatized for 7 days preoperatively.
Postoperatively the mice were housed individually and received oral analgesic treatment (Buprenorphine, 0.2 mg/g) daily for 3 days. They were maintained at a normal 12-h day/night cycle at 21 degrees Celsius with a humidity of 45-55%. They were fed a standard diet and water ad libitum. The mice were euthanized by cervical dislocation under anesthesia at the end of the study.
During the study, four mice were euthanized for ethical reasons by the veterinarian due to poor wound healing. One died during anesthesia while being µCT-scanned.
Lymphedema model. The mice had lymphedema induced with a previously described model 29 with minor modifications.
Briefly, the lymphedema was established in four separate procedures. Irradiation two times before surgery, surgery itself and irradiation after surgery.
The hindlimbs of the mice were irradiated with a dose of 22.5 Gray (Gy) in three fractions (Gulmay D3100 X-ray instrument (Xstrahl, Camberley, UK) with a dose rate of 5.11 Gy/min (100 kVp, 10 mA, HVL 2.53 Al). The irradiation was performed 7 and 3 days before the surgery and 3 days after surgery.
The surgery consisted of microsurgery of the right hindlimb. The lymph vessels were tied with 10-0 suture and two lymph nodes were dissected. The surgical and irradiation details are explained and shown in a video by Wiinholt et al. 29 .
The modifications consisted of irradiation with three fractions of 7.5 Gy instead of two fractions of 10 Gy.
µCT-scans. The µCT-scans were performed on a Siemens INVEON multimodality pre-clinical scanner (Siemens pre-clinical solutions, Knoxville, TN, USA) (Fig. 8). The animals were anesthetized with 1.5-2% isoflurane (ScanVet Animal Health, Fredensborg, Denmark) mixed with 100% oxygen during the scans. The mice were placed front feet first in prone position on a heated animal bed (38 mm). The standardization, assessment and analysis of the µCT-scans have previously been described 16 Briefly; the distal tibiofibular joint was chosen as the upper volumetric boundary limit. The volume was then calculated in Inveon research workplace software, version 4.2 (IRW; Siemens Healthcare, Ballerup, Denmark). See Fig. 8.
To standardize the measurements, the musculotendinous junction of the gastrocnemius muscle was used as a landmark in the plethysmometer measurements 10 . The hindlimb of the mice was inserted in the water until the junction. The volumetric measurements were then noted.
Electronic caliper. The electronic caliper used was from Insize Digital Caliper Series 1108 (Insize Co. LTD, China). The animals were anesthetized with 1.5-2% isoflurane (ScanVet Animal Health, Fredensborg, Denmark) mixed with 100% oxygen during the measurements. The paw thickness was measured in a transverse technique between the first and second proximal pad of the paw with the electronic caliper. See Fig. 8.
Raters. None of the raters had prior experience with the plethysmometer or the electronic caliper. All raters had experience with the µCT-scans. R1 (AB) had approximately 25 h of experience while R2 (AW) and R3 (FD) had approximately 50 h of experience. The plethysmometer and electronic caliper techniques require no prior training, while the µCT-scan require a basic understanding of CT imaging e.g., a Bachelor of Science in Medicine and approximately 1 h of training. All raters were blinded between each other's measurements.